Non-Rationalised Economics NCERT Notes, Solutions and Extra Q & A (Class 9th to 12th)
9th					10th					11th					12th

Class 11th Chapters
Indian Economic Development
1. Indian Economy On The Eve Of Independence	2. Indian Economy 1950-1990	3. Liberalisation, Privatisation And Globalisation : An Appraisal
4. Poverty	5. Human Capital Formation In India	6. Rural Development
7. Employment: Growth, Informalisation And Other Issues	8. Infrastructure	9. Environment And Sustainable Development
10. Comparative Development Experiences Of India And Its Neighbours
Statistics For Economics
1. Introduction	2. Collection Of Data	3. Organisation Of Data
4. Presentation Of Data	5. Measures Of Central Tendency	6. Measures Of Dispersion
7. Correlation	8. Index Numbers	9. Use Of Statistical Tools

Chapter 3 Organisation Of Data

1. Introduction

After learning how to collect data in the previous chapter, the next logical step is to understand how to organize it. The data we first collect, known as raw data, is often chaotic, disorganized, and difficult to comprehend, much like a pile of assorted items at a local junk dealer's (kabadiwallah's) shop.

Just as a kabadiwallah sorts his junk into categories like glass, plastic, and metal to manage his business efficiently, a statistician must organize raw data into a structured format. This process is called classification. The primary purpose of classifying data is to arrange it into groups based on common characteristics, which brings order to the information and makes it suitable for further statistical analysis and interpretation.

2. Raw Data

Raw data is the term used for data in its original, unorganized form, exactly as it was collected. This type of data is often large, unwieldy, and confusing. Trying to draw meaningful conclusions directly from a large set of raw data is an extremely tedious and often impossible task.

For instance, look at the following raw data representing the mathematics marks of 100 students.

47	45	10	60	51	56	66	100	49	40
60	59	56	55	62	48	59	55	51	41
42	69	64	66	50	59	57	65	62	50
64	30	37	75	17	56	20	14	55	90
62	51	55	14	25	34	90	49	56	54
70	47	49	82	40	82	60	85	65	66
49	44	64	69	70	48	12	28	55	65
49	40	25	41	71	80	0	56	14	22
66	53	46	70	43	61	59	12	30	35
45	44	57	76	82	39	32	14	90	25

From this table, it is difficult to quickly determine key information, such as the highest or lowest score, the average performance, or how many students passed. To make sense of this data, it must be organized and summarized through classification. This process makes the data comprehensible and allows for easy location of information, comparison, and inference.

3. Classification of Data

Classification is the process of arranging or organizing data into groups or classes based on some shared criteria or characteristic. The method of classification depends entirely on the purpose of the study. There are four primary types of classification:

Chronological Classification: Data is classified based on time. The arrangement can be in ascending or descending order with respect to years, months, weeks, or any other time period. This type of data is also known as a time series.

Example 1. Population of India from 1951 to 2011.

Year Population (Crores)

1951 35.7

1961 43.8

1971 54.6

1981 68.4

1991 81.8

2001 102.7

2011 121.0
Spatial (Geographical) Classification: Data is classified based on geographical location, such as countries, states, cities, or districts.

Example 2. Yield of Wheat for Different Countries (2013).

Country Yield (kg/hectare)

Canada 3594

China 5055

France 7254

India 3154

Pakistan 2787
Qualitative Classification: Data is classified based on descriptive characteristics or attributes that cannot be measured numerically. Examples include gender, religion, nationality, or literacy. The classification is done based on the presence or absence of an attribute.

Example 3. Population classified by gender and marital status.

This is a manifold classification, where the data is first divided into two groups (Male/Female) and then each group is further subdivided based on another attribute (Married/Unmarried).
Quantitative Classification: Data is classified based on characteristics that can be measured numerically, such as height, weight, age, income, or marks. When such data is grouped into classes, it forms a quantitative classification, typically presented as a frequency distribution.

Year	Population (Crores)
1951	35.7
1961	43.8
1971	54.6
1981	68.4
1991	81.8
2001	102.7
2011	121.0

Country	Yield (kg/hectare)
Canada	3594
China	5055
France	7254
India	3154
Pakistan	2787

4. Variables: Continuous and Discrete

A variable is a characteristic that can be measured and whose value changes from one observation to another. Variables can be broadly classified into two types:

Continuous Variable: A variable that can take any numerical value within a given range. This includes integers, fractions, and decimals. Its value can change in infinitely small gradations.
- Examples: Height, weight, time, distance, temperature. A person's height does not jump from 150 cm to 151 cm; it passes through every possible value in between, such as 150.1 cm, 150.11 cm, etc.
Discrete Variable: A variable that can only take specific, distinct values and "jumps" from one value to the next without taking any intermediate values. These are typically values that can be counted in whole numbers.
- Examples: The number of students in a class, the number of cars on a road, or the number appearing on a rolled dice. You can have 25 or 26 students, but not 25.5 students.

5. What Is a Frequency Distribution?

A frequency distribution is a concise and comprehensive way to classify the raw data of a quantitative variable. It is a table that organizes data by grouping it into classes and showing the number of observations (frequency) that fall into each class.

Key Components of a Frequency Distribution:

Class Limits: These are the two endpoints of a class. The lowest value is the Lower Class Limit, and the highest value is the Upper Class Limit. For the class 60–70, 60 is the lower limit and 70 is the upper limit.
Class Interval (or Class Width): This is the difference between the upper class limit and the lower class limit. For the class 60–70, the interval is $70 - 60 = 10$.
Class Mark (or Class Mid-Point): This is the middle value of a class, calculated as:
$\text{Class Mark} = (\text{Upper Class Limit} + \text{Lower Class Limit}) \div 2$

The class mark is used to represent the entire class in further statistical calculations.

How to Prepare a Frequency Distribution?

Constructing a frequency distribution involves making several key decisions:

Should class intervals be equal or unequal?
Equal intervals are generally preferred for simplicity and ease of comparison. However, unequal intervals are used when data is highly skewed, such as income data, where a few individuals may have very high incomes. Using equal intervals in such cases would either create too many classes or mask important details at the lower or upper ends.
How many classes should there be?
There is no fixed rule, but typically, the number of classes is kept between 6 and 15. Too few classes can hide important patterns, while too many can be as confusing as the raw data itself.
What should be the size of each class?
The size (or width) of the class interval is linked to the number of classes and the range of the data (Range = Largest Value - Smallest Value). Once the desired number of classes is decided, the approximate size of the class interval can be found by dividing the range by the number of classes.
How should class limits be determined?
This involves choosing between two methods:
- Inclusive Method: Both the lower and upper limits of a class are included in that class itself (e.g., 0-10, 11-20, 21-30). This method is often used for discrete variables. A notable feature is the "gap" between the upper limit of one class and the lower limit of the next.
- Exclusive Method: The upper limit of a class is excluded, and any observation equal to the upper limit is included in the next class (e.g., 0-10, 10-20, 20-30). In the class 10-20, values from 10 up to (but not including) 20 are included. This method is preferred for continuous variables as it ensures continuity in the data.
How is the frequency for each class found?
This is done by going through the raw data and using tally marks to count how many observations fall into each class. A tally mark (/) is placed against a class for each observation. For ease of counting, tallies are grouped in fives: four vertical lines and a fifth line crossing them diagonally (////). The total number of tally marks for a class is its frequency.

Adjustment in Class Interval

When using the inclusive method for a continuous variable, the "gap" between classes needs to be removed to ensure continuity. This is done through an adjustment:

Find the difference between the lower limit of a class and the upper limit of the preceding class. (e.g., in a series 800-899, 900-999, the difference is $900-899 = 1$).
Divide this difference by two (e.g., $1 \div 2 = 0.5$).
Subtract this value from all lower limits and add it to all upper limits. (e.g., 800-899 becomes 799.5-899.5).

Loss of Information

A major drawback of classifying data into a frequency distribution is the loss of information. Once raw data is grouped into classes, the individual values of the observations are lost. All subsequent calculations are based on the class mark, which is an assumed representative value for all observations in that class. This is an approximation, but it is a necessary trade-off for making large datasets comprehensible and manageable.

Frequency Array

For a discrete variable, the classification of its data is called a Frequency Array. Instead of grouping data into classes, this is a simple table that lists each distinct value of the variable and its corresponding frequency (how many times it appears in the dataset).

6. Bivariate Frequency Distribution

Sometimes, we collect data on two variables simultaneously for each element of a sample (e.g., collecting both the height and weight of each student). This is known as bivariate data. To summarize such data, we use a Bivariate Frequency Distribution.

This is a two-way table where the classes of one variable are arranged in rows and the classes of the other variable are arranged in columns. Each cell in the table shows the joint frequency—the number of observations that fall into that specific row and column class simultaneously. This type of table is also known as a contingency table and is crucial for studying the relationship between two variables, such as correlation.

Ad Expenditure (in '000 ₹)	Sales (in Lakh ₹)						Total
Ad Expenditure (in '000 ₹)	115–125	125–135	135–145	145–155	155–165	165–175	Total
62–64	2		1				3
64–66		3	1				4
66–68		1	2	1	1		5
68–70		1	1	2			4
70–72			1			1	4
Total	2	5	6	3	1	1	20

In this table, for example, there is 1 firm whose advertisement expenditure is between ₹64-66 thousand and whose sales are between ₹135-145 lakh.

7. Conclusion

Data collected from primary and secondary sources is initially raw and unorganized. To make this data useful for statistical analysis, it must first be classified. Classification is the process of organizing data into a structured format, most commonly a frequency distribution.

This process brings order to the data, making it concise and comprehensible. Understanding the techniques of classification and how to construct a frequency distribution for both continuous and discrete variables is a fundamental skill in statistics.

Recap

Classification is the process of organizing raw data to make it understandable.
A Frequency Distribution is a table that groups data into classes and shows their corresponding frequencies.
The Exclusive Method of classification excludes the upper limit of a class, while the Inclusive Method includes both limits.
Once data is classified, statistical calculations are based on the class mark (mid-point), not individual observations, which leads to some loss of information.
Classes should be formed such that the class mark is a good representative of the observations within that class. This may sometimes require using unequal class intervals.
A bivariate frequency distribution is used to summarize data for two variables simultaneously.

Exercises

This section contains questions for practice and self-assessment, designed to test the learner's understanding of the concepts discussed in the chapter, such as defining class midpoint, distinguishing between different types of distributions, creating frequency distributions from raw data, and understanding the concept of 'loss of information'.

Suggested Activity

This section provides ideas for practical projects, such as analyzing one's own past examination marks to see if the marks constitute a variable and to track improvement over time.

47	45	10	60	51	56	66	100	49	40
60	59	56	55	62	48	59	55	51	41
42	69	64	66	50	59	57	65	62	50
64	30	37	75	17	56	20	14	55	90
62	51	55	14	25	34	90	49	56	54
70	47	49	82	40	82	60	85	65	66
49	44	64	69	70	48	12	28	55	65
49	40	25	41	71	80	0	56	14	22
66	53	46	70	43	61	59	12	30	35
45	44	57	76	82	39	32	14	90	25

47	45	10	60	51	56	66	100	49	40
60	59	56	55	62	48	59	55	51	41
42	69	64	66	50	59	57	65	62	50
64	30	37	75	17	56	20	14	55	90
62	51	55	14	25	34	90	49	56	54
70	47	49	82	40	82	60	85	65	66
49	44	64	69	70	48	12	28	55	65
49	40	25	41	71	80	0	56	14	22
66	53	46	70	43	61	59	12	30	35
45	44	57	76	82	39	32	14	90	25

47	45	10	60	51	56	66	100	49	40
60	59	56	55	62	48	59	55	51	41
42	69	64	66	50	59	57	65	62	50
64	30	37	75	17	56	20	14	55	90
62	51	55	14	25	34	90	49	56	54
70	47	49	82	40	82	60	85	65	66
49	44	64	69	70	48	12	28	55	65
49	40	25	41	71	80	0	56	14	22
66	53	46	70	43	61	59	12	30	35
45	44	57	76	82	39	32	14	90	25